FOOD 101 - Intro to Food

Diksha Kataria


Churros

Churros

library(readr)
library(ggplot2)
library(dplyr)
## Warning: package 'dplyr' was built under R version 3.5.2
## 
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union
library(methods)
library(stringi)
library(keras)
## Warning: package 'keras' was built under R version 3.5.2
library(glmnet)
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-16

Loading the Food-101 Dataset

The dataset I decided to use for my image classification project is called “Food-101”. The entire dataset has 101 categories of food with a 1000 images in each category; however, I only use a subset of that with 10 categories to make the processing easier. Out of the 1000 images in each food category, 750 training images have deliberately not been cleaned as to make the classification algorithm more robust. The categories I use in my image classification task, with the number of images in each category are:

input_dir <- "~/Desktop/Statistical Learning/Project/food-101/myimages"
class_vector <- dirname(dir(input_dir, recursive = TRUE))
cbind(table(class_vector))
##                         [,1]
## apple_pie               1000
## cheesecake              1000
## churros                 1000
## donuts                  1000
## french_fries            1000
## grilled_cheese_sandwich 1000
## pancakes                1000
## pizza                   1000
## samosa                  1000
## tacos                   1000

I start off by loading in my dataset which has already been processed and embedded in a seperate file. Along with that, I load in the metadata for my dataset.

#loading in the data set
img_data <- read.csv("~/Desktop/Statistical Learning/Project/my-image-data.csv")
X2 <- read_rds("~/Desktop/Statistical Learning/Project/my-image-embed.rds")

Sample Images

To make sure the images were loaded correctly, let’s look at a few sample images from the category “apple pie”.

#looking at the images
paths <- dir(input_dir, recursive = TRUE, full.names=TRUE)
par(mar = c(0,0,0,0))
par(mfrow = c(4, 6))
set.seed(1)
for (i in 1:24) {
  plot(0,0,xlim=c(0,1),ylim=c(0,1),axes= FALSE,type = "n")
  Z <- image_to_array(image_load(paths[i], target_size = c(224,224)))
  rasterImage(Z/255,0,0,1,1)
}

Here’s another sample of images, this time from the category “samosa”.

par(mar = c(0,0,0,0))
par(mfrow = c(4, 6))
set.seed(1)
for (i in 8001:8024) {
  plot(0,0,xlim=c(0,1),ylim=c(0,1),axes= FALSE,type = "n")
  Z <- image_to_array(image_load(paths[i], target_size = c(224,224)))
  rasterImage(Z/255,0,0,1,1)
}

As you can see, the images haven’t been cleaned and contain some noise (wrong labels, intense colors) which makes this classification task particularly challenging.

Transfer Learning Model

The model I used for transfer learning here is the ResNet 50 model which was trained as part of the ImageNet challenge. The corpus used for the challenge contained images of the size 224x224 pixels which is why I decided to go with the Food-101 dataset as opposed to my orignal choice CiFar10.

I decided to add two dense layers prior to he final layer and realized that if I start with a large number of units and then grdually have less and less units the neural network seemed to perform better. At every dense layer, I also decided to use batch normalization (which I came across as part of the Keras documentation).

The basic idea behind batch normalization is to normalise the inputs of each layer in such a way that they have a mean output activation of zero and standard deviation of one. This regulaization also reduces overfitting and better generalizes the model (Accoding to the oiginal paper on Batch Normalization). During hyper parameter tuning I also saw that smaller dropout values worked better than larger ones in the intermediate layers so I went with a drropout rate of 0.5. I also kept the learning rate small (so that we do not have any exploding gradients).

Lastly, I decided to go with the “sigmoid” activation, since it seemed to perfom better than “relu” and “selu”. I present my final version of the model below.

#creating transfer learning model
X_train <- X2[img_data$train_id == "train",]
y_train <- to_categorical(img_data$class[img_data$train_id == "train"])

model <- keras_model_sequential()
model %>%
  
  layer_dense(units = 256, input_shape = ncol(X_train)) %>%
  layer_batch_normalization() %>%
  layer_activation(activation = 'sigmoid') %>%
  layer_dropout(rate = 0.5) %>%
  
  
  layer_dense(units = 128, input_shape = ncol(X_train)) %>%
  layer_batch_normalization() %>%
  layer_activation(activation = 'sigmoid') %>%
  layer_dropout(rate = 0.5) %>%

  layer_dense(units = ncol(y_train)) %>%
  layer_activation(activation = "softmax")

model %>% compile(loss = 'categorical_crossentropy',
                  optimizer = optimizer_rmsprop(lr = 0.001 / 2),
                  metrics = c('accuracy'))
model
## Model
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## dense (Dense)                    (None, 256)                   524544      
## ___________________________________________________________________________
## batch_normalization (BatchNormal (None, 256)                   1024        
## ___________________________________________________________________________
## activation (Activation)          (None, 256)                   0           
## ___________________________________________________________________________
## dropout (Dropout)                (None, 256)                   0           
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 128)                   32896       
## ___________________________________________________________________________
## batch_normalization_1 (BatchNorm (None, 128)                   512         
## ___________________________________________________________________________
## activation_1 (Activation)        (None, 128)                   0           
## ___________________________________________________________________________
## dropout_1 (Dropout)              (None, 128)                   0           
## ___________________________________________________________________________
## dense_2 (Dense)                  (None, 10)                    1290        
## ___________________________________________________________________________
## activation_2 (Activation)        (None, 10)                    0           
## ===========================================================================
## Total params: 560,266
## Trainable params: 559,498
## Non-trainable params: 768
## ___________________________________________________________________________

After about 10 epochs the model seemed to be overfitting as the training accuracy kept going up but the validation accuracy plateued.

#fit data
history <- model %>%
  fit(X_train, y_train, epochs = 10)
plot(history)

Results

Following are the best results I got from the various versions of the models I ran:

y_pred <- predict_classes(model, X2)
tapply(img_data$class == y_pred, img_data$train_id, mean)
##   train   valid 
## 0.94600 0.84375

Confusion Matrix

Here is the confusion matrix that resulted from the predictions of my model.

y = img_data$class
table(y[img_data$train_id == "valid"], y_pred[img_data$train_id == "valid"])
##    
##       0   1   2   3   4   5   6   7   8   9
##   0 306  11   6   4   1  33  19   9  10   1
##   1  31 319   3  12   2  17   7   4   4   1
##   2   8  10 356   7   6   3   4   1   4   1
##   3  23   8  12 329   4   7   7   2   3   5
##   4   2   0   2   0 378  10   0   0   2   6
##   5  10   4   6   2  34 322   5   2   9   6
##   6  29  25   2   4   1  20 309   5   4   1
##   7   6   1   5   1   2   4   2 370   1   8
##   8  22   4   3   4   4  12   2   1 338  10
##   9   3   2   1   2   3  23   2   6  10 348

The confusion matrix presents some very interesting findings. First, let’s look at the class names associated with each class so we can read the matrix.

categories <- c("apple_pie", "cheesecake", "churros", "donuts", "french_fries", "grilled_cheese_sandwich", "pancakes", "pizza", "samosa", "tacos")
class_names <- 0:9
df <- cbind(class_names, categories)
df
##       class_names categories               
##  [1,] "0"         "apple_pie"              
##  [2,] "1"         "cheesecake"             
##  [3,] "2"         "churros"                
##  [4,] "3"         "donuts"                 
##  [5,] "4"         "french_fries"           
##  [6,] "5"         "grilled_cheese_sandwich"
##  [7,] "6"         "pancakes"               
##  [8,] "7"         "pizza"                  
##  [9,] "8"         "samosa"                 
## [10,] "9"         "tacos"

As we can see, some of the cateegories that the classification model confused repeatedly were: apple pie and grilled cheese sandwich (most commonly confused), cheesecake and pancakes, apple pie and cheesecake etc. Now let’s pull up some images that the model misclassified.

Sample Predictions

Here are a few of the classifications that the model got wrong:

par(mfrow = c(2, 3))
id <- which(y_pred != y)
for (i in id[2:7]) {
  par(mar = rep(0, 4L))
  plot(0,0,xlim=c(0,1),ylim=c(0,1),axes= FALSE,type = "n")
  Z <- image_to_array(image_load(paths[i], target_size = c(224,224)))
  rasterImage(Z/255,0,0,1,1)
  text(0.5, 0.1, label = categories[y_pred[i] + 1L], col = "red", cex=2)
}

And some classifications that the model classified with the highest probabilities:

y_probs <- predict(model, X2)
id <- apply(y_probs, 2, which.max)
par(mfrow = c(2, 4))
for (i in id[1:8]) try({
  par(mar = rep(0, 4L))
  plot(0,0,xlim=c(0,1),ylim=c(0,1),axes= FALSE,type = "n", asp=1)
  Z <- image_to_array(image_load(paths[i], target_size = c(224,224)))
  rasterImage(Z/256,0,0,1,1)
})

Visualization of the Model

Lastly, we can visualize the kernels.

layer <- get_layer(model, index = 1)
par(mar = c(0,0,0,0))
par(mfrow = c(16,16))
for(i in 1:256){
  wg <- layer$get_weights()[[1]][,i]
  im <- abs(wg) / max(abs(wg))
  plot(0,0,xlim=c(0,1), ylim=c(0,1), axes=FALSE, type='n')
  rasterImage(im,0,0,1,1,interpolate = FALSE)
  box()
}

I tried to visualize the embeddings using pca as well; however, the principle component analysis function prcomp() on my computer was very slow and resulted in the R session being aborted repeatedly.

#pca <- as_tibble(prcomp(X2)$x[,1:2])
#pca$y <- categories[y + 1L]
#ggplot(pca, aes(PC1, PC2)) +
#geom_point(aes(color = y), size = 4) +
#labs(x = "", y = "", color = "class") +
#theme_minimal()

Conclusion

The transfer learning classification model was able to correctly classify about 85% of the food images. Considering the quality of the images, this result seems fairly reasonable to me.